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A PARALLEL COUNTER AND A MULTIPLICATION LOGIC CIRCUIT 

The present invention generally relates to digital electronic devices and more 
particularly to a digital electronic device performing binaiy logic. In one aspect the 
present invention relates to a parallel coimter and in another aspect the present invention 
relates to a multiplication logic circuit for multiplying two binary numbers. 

It is instrumental for many applications to have a block that adds n inputs together. An 
output of this block is a binaiy representation of the number of high inputs. Such blocks, 
called parallel counters (L. Dadda, Some Schemes for Parallel Multipliers, Alta Freq 34: 
349-356 (1 965); E. E. Swartzlander Jr., Parallel Counters, IEEE Trans. Comput. C-22: 
1021-1024 (1973)), are used in circuits performing binaiy multiplication. There are 
other applications of a parallel counter, for instance, majority*voting decoders or RSA 
encoders and decoders. It is important to have an implementation of a parallel counter 
that achieves a maximal speed. It is known to use parallel counters in multiplication (L. 
Dadda, On Parallel Digital Multipliers. Alta Freq 45: 574-580 (1976)). 

A full adder is a special parallel counter with a three-bit input and a two-bit output A 
current implementation of higher parallel counters i.e. with a bigger number of inputs is 
based on using full adders (C. C. Foster and F. D. Stockton, Counting Responders in an 
Associative Memory, IEEE Trans. Comput. C-20: 1 580-1583 (1971)). In general, the 
least significant bit of an output is the fastest bit to produce in such implementation 
while other bits are usually slower. 

The following notation is used for logical operations: 

©-Exclusive OR; 
v-OR; 
A-AND; 
-•-NOT. 



An efficient prior art design (Foster and Stockton) of a parallel counter uses full adders. 
A full adder, denoted FA, is a three-bit input parallel counter shown in figure 1 . It has 
three inputs Xi, X2, X3, and two outputs S and C. Logical expressions for outputs are 

s = Xiex2ex3, 

C = P(|aX2)v(XiaX3)v(X2aX3). 
A half adder, denoted HA, is a two bit input parallel counter shown in figure 1 . It has 
two inputs Xh X2 and two outputs S and C. Logical expressions for ouq)uts are 

S = X|®X2y 

C X1AX2- 

A prior art implementation of a seven-bit input parallel counter illustrated in figure 2. 

Multiplication is a fundamental operation. Given two n-digit binaiy numbers 

A„.i2"'^+Aa.22'^+...+Ai2+Ao and B„.i2**"'+Br.22"'^+...+B|2+Bo, 
their product 

P2„.|2^"-'+P2„.22^'''^+.-.+P,2+Po 
may have up to 2n digits. Logical circuits generating all Pi as outputs generally follow 
the scheme in figure 14. Wallace has invented the first fast architecture for a multiplier, 
now called the Wallace-tree multiplier (Wallace, C. S., A Suggestion for a Fast 
Multiplier, IEEE Trans. Electron. Comput. EC-13: 14-17 (1964)). Dadda has 
investigated bit behaviour in a multiplier (L. Dadda» Some Schemes for Parallel 
Multipliers, Alta Freq 34: 349-3S6 (1965)). He has constructed a varieQr of multipliers 
and most multipliers follow Dadda's scheme. 

Dadda's multiplier uses the scheme in on figure 14. If inputs have 8 bits then 64 parallel 
AND gates generate an array shown in figure IS. The AND gate sign a is omitted for 
clarity so that A| aBj becomes AjBj- The rest of figure 1 5 illustrates array reduction that 
involves fiill adders (FA) and half adders (HA). Bits fiom the same column are added by 
half adders or fiill adders. Some groups of bits fed into a full adder are in rectangles. 
Some groups of bits fed into a half adder are in ovals. The result of array reduction is 
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just two binary numbers to be added at the last step. One adds these two numbers by one 
of fast addition schemes, for instance, conditional adder or cany-look*ahead adder. 

In accordance with the first aspect the present invention provides a parallel counter 
which is based on algebraic properties of synunetric functions. Each of the plurality of 
binaiy output bits is generated as a symmetric function of a plurality of binary input bits. 

The symmetric functions comprise logically AND combining sets of one or more binaiy 
inputs and logically OR or exclusive OR logic combining fhe logically combined sets of 
binary inputs to generate a binaiy ou^ut The OR and the exclusive OR synunetric 
functions are elementary symmetric functions and the generated output binaiy bit 
depends only on the number of high inputs among the input binaiy bits. For the OR 
symmetric function, if the number of high inputs is m, the output is high if and only if m 
>k, where k is the size of the sets of binary inputs. Similarly, the generated output 
binary bit using the exclusive OR symmetric function is high if and only if m >k and the 
number of subsets of inputs of the set of high inputs is an odd number. The size of the 
sets can be selected. The i^ output bit can be generated using the symmetric function 
using exclusive OR logic by selecting the set sizes to be of size 2\ where i is an integer 
from 1 to N, N is the number of binary outputs, and i represents the significance of each 
binary output. 

The sets of binaiy inputs used in the synunetric functions are each unique and they cover 
all possible combinations of binary inputs. 

Thus in one embodiment of the present invention, each of the binaiy outputs can be 
generated using a synunetric function wbidi uses exclusive OR logic. However, 
exclusive OR logic is not as fast as OR logic. 

Thus in accordance with an embodiment of the present invention at least one of the 
binary outputs is generated as a symmetric function of the binary inputs using OR logic 
for combining a variety of sets of one or more binary inputs. The logic is arranged to 



logically AND members of each set of binary inputs and logically OR the result of the 
AND operations. 

Thus use of the symmetric function using OR logic is faster and can be used for 
generation of the most significant output bit In such an embodiment the set size is set 
to be where N is the number of bmaiy outputs and the N* binary output is the most 
significant. 

It is also possible to use the symmetric function using OR logic for less significant bits 
on the basis of the output value of a more significant bit In such a case, a plurality of 
possible binaiy outputs for a binary output less significant than the N* arc generated as 
symmetric functions of the binary inputs usmg OR logic for combining a plurality of 
sets of one or more binary inputs, where N is the number of binary outputs. Selector 
logic is provided to select one of the possible binary outputs based on a more significant 
binary output value. The size of the sets used in such an arrangement for (he (N-I)* bit 
is preferably 2^^"' + 2^**^ and 2^"^ respectively and one of the possible binary outputs is 
selected based on the N^ binaiy ouQ>ut value. 

In one embodiment of the present invention the circuit is designed in a modidar form. A 
plurality of subcircuit logic modules are designed, each for generating intermediate 
binary outputs as a symmetric function of some of the binary inputs. Logic is also 
provided in this embodiment for logically combining the intermediate binary outputs to 
generate a binaiy outputs. 

Since OR logic is faster, m a preferred embodiment the subcircuit logic modules 
implement the symmetric functions using OR logic. In one embodiment the subcircuit 
modules can be used for generating some binary outputs and one or more logic modules 
can be provided for generating other binaiy outputs in which each logic module 
generates a binaiy output as a symmetric function of the binaiy inputs exclusive OR 
logic for combining a plurality of sets of one or more binary inputs. 
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Thus this aspect of the present invention provides a fast circuit that can be used in any 
architecture using parallel counters. The design is applicable to any type of technology 
from which the logic circuit is buih. 

The parallel counter in accordance with this aspect of the present invention is generally 
applicable and can be used in a multiplication circuit that is significantly faster than 
prior art implementations. 

hi accordance with the second aspect of the present invention a technique for 
multiplying 2N bit binary numbers comprises an array generation step in which an array 
of logical combinations between the bits of the two binary numbers is generated which 
is of reduced size compared to the prior art 

In accordance with this aspect of the jmsent invention, a logic circuit for multiplying 
2N bit numbers comprises anray generation logic for performuig the logical AND 
operation between each bh in one binary bit and each bit in the other binary number to 
generate an array of logical AND combinations comprising an array of biruuy values, 
and for further logically combining logically adjacent values to reduce the maximum 
depth of the array to below N bits; array reduction logic for reducing the depth of the 
array to two binary numbers; and addition logic for adding the binary values of the two 
binary numbers. 

When two binary numbers are multiplied together, as is conventional, each bit Ai of the 
first binary number is logically AND combined with each bh Bj of the second number to 
generate the array which comprises a sequence of binary numbers represented by the 
logical AND combinations, Ai AND Bj. The further logical combinations are carried 
out by logically combining the combinations A| AND Bn.2, Ai AND Bn-u Ao AND Bn- 
2, and Ao AND B^4.h where N is the number of bits in the binary numbers. In this way 
the size of the maximal column of numbers to be added together in the array is reduced. 



More specifically the array generation logic is airanged to combine the combinations Ai 
AND Bn.2 and Ao AND B^,] using exclusive OR logic to replace these combinations and 
to combine Ai AND Bn-i and Ao AND Bn.2 to replace the Ai AND B^i combination. 

In one embodiment of the present invention the array reduction logic can include at least 
one of: at least one full adder, at least one half adder, and at least one parallel counter. 
The or each parallel counter can comprise the parallel counter in accordance with the 
first aspect of the present invention. 

The second aspect of the present invention provides a reduction of the maximal colunm 
length in the array thereby reducing the number of steps required for array reduction. 
When the first aspect of the present invention is used in conjunction with the second 
aspect of the present invention, an even more efficient multiplication circuit is provided. 

Embodiments of the present invention will now be described with reference to the 
accompanying drawings, in which: 

Figure 1 is a schematic diagram of a full adder and a half adder in accordance with the 
prior art. 

Figure 2 is a schematic diagram of a parallel counter using full adders in accordance 
with the prior ait. 

Figure 3 is a schematic diagram illustrating the logic modules executing the symmetric 
functions for the generation of binary outputs and the multiplexor (selector) used for 
selecting outputs, 

Figure 4 is a diagram illustrating the logic for implementing the symmetric function 
ORJ.l, 

Figure S is a diagram illustrating the logic for implementing the symmetric function 
0R.4J, 

Figure 6 is a diagram illustrating the logic for implementing the symmetric function 
0R_S_1 using 2 3 input OR gates. 

Figure 7 is a diagram illustrating the logic for implementing the symmetric function 
EX0R_7_1 using two input exclusive OR gates, 



Figure 8 is a diagram illustrating the logic for implementing the symmetric ftinction 

Figure 9 is a diagram illustrating the logic for implementing the symmetric fraction 
EXORJJ, 

Figure 10 is a diagram illustrating a parallel counter using the two types of symmetric 
functions and having seven inputs and three outputs. 

Figure 1 1 is a diagram illustrating splitting of the synunetric Ainction OR_7_2 into sub 
modules to allow the reusing of smaller logic blocks, 

Figure 12 is a diagram of a parallel counter using the EX0R_7_1 symmetric function 
for the generation of the least significant output bit from all of the input bits, and 
smaller modules hnplementing symmetric functions using OR logic to generate the 
second and third output bits. 

Figure 13 is a another diagram of a parallel counter similar to that of Figui^ 12 accept 
that the partitioning of the inputs is chosen differently to use different functional sub 
modules. 

Figure 14 is a diagram of the steps used in the prior art for multiplication. 
Figure IS is a schematic diagram of the process of Figure 14 in more detail. 
Figure 16 is a diagram illustrating the properties of diagonal regions in the array. 
Figure 1 7 is a diagram illustrating array deformation in accordance with the embodiment 
of the present invention and the subsequent steps of array reduction and adding, and 
Figure 18 is a diagram of logic used in diis embodiment for array generation. 

The first aspect of the present invention will now be described. 

The first aspect of the present invention relates to a parallel counter counting the number 
of high values in a binary number. The counter has i outputs and n inputs where i is 
determined as being the integer part of log2 n plus I 

A mathematical basts for the first aspect of the present invention is a theory of 
symmetric functions. We denote by C\ the number of distinct k element subsets of a set 
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of n elements. We consider two functions EXOR_n_k and OR_n_k of n variables X|, 
X2,...X„ given by 

EXOR_.n_k(X|. Xi. ... X„) « © (Xj, a Xij a ... aXiO. 

OR_n_k(Xi, Xi, ... Xn) = V pCii A Xi2 A AXik) 

where 01> i2, ... ik) nrns over all possible subsets of {Xi, X2, ... Xn} that contain 
precisely k elements. Blocks that produce such outputs are shown on figure 3. 

The functions EXOR_n_k and OR_n_k arc elementary symmetric functions. Their 
values depend only on the number of high inputs among Xi, X2, X3, , . . Xo- More 
precisely, if m is the number of high inputs among Xi, X2, X3, ... Xo then OR_n_k(Xi, 
X2, ... Xo) is high if and only if m^c. Similarly, EXOR^n^kpCi, X2. X«) is high if 
and only if m^ and C\ is odd. 

Although EXOR_n_k and OR_n_k look similar, OR_n_k is much faster to produce 
since EXOR-gates are slower than OR-gates. 

In the above representation n is the number of inputs and k is the size of the subset of 
inputs selected. Each set of k inputs is a unique set and the subsets comprise all 
possible subsets of the set of inputs. For example, the synunetric function 0R_3_1 has 
three inputs Xi, X2 and X3 and the set size is 1. Thus the sets comprise Xu X2 and Xj. 
Each of these sets is then logically OR combined to generated the binary output The 
logic for perfoiming this function is iUustrated in Figure 4. 

Figure 5 illustrates the logic for perfomiing the synunetric 0Rj4_l . 

When the number of inputs become large, it may not be possible to use simple logic. 

Figure 6 illustrates the use of two OR gates for unplementing the synmietric function 
0R_5.1. 



Figure 7 similarly illustrates the logic for perfomiing EX0R_7_L The sets comprise 
the inputs Xi, X2, X3, X4, X5 and X7. These inputs are input into three levels of 
exclusive OR gates. 

When k is greater than 1, the inputs in a subset must be logically AND combined. 
Figure 8 illustrates logic for performing the symmetric function OR_3_2. The inputs Xi 
and X2 comprise the first set and are input to a first AND gate. The inputs X| and Xa 
constitute a second set and are input to a second AND gate. The inputs X2 and X3 
constitute a third set and are input to a third AND gate* The output of the AND gates 
are input to an OR gate to generate the output function. 

Figure 9 is a diagram illustrating the logic for performing the synmietric function 
EX0R_S_3. To perform this function the subsets of size 3 for the set of five inputs 
comprise ten sets and ten AND gates are required. The output of the AND gates are 
input to an exclusive OR gate to generate the function. 

The specific logic to implement the symmetric functions will be technology dependent 
Thus the lo^c can be designed in accordance with the technology to be used. 

In accordance with a first embodiment of the present invention the parallel counter of 
each output is generated using a synunetric function using exclusive OR logic. 

Let the parallel counter have n inputs X|, ... Xn and t+1 outputs St, S|.|, ... So. So is the 
least significant bit and St is the most significant bit For all i from 0 to t, 

Si=EXOR_nJ'(X,, X2, ... X„). 

It can thus be seen that for a seven bit input i.e. n^^?, i will have values of 0, 1 and 2. 
Thus to generate the output So the itmction will be EX0R_7_1 , to generate the output Si 
the finction will be EXOR_7_2 and to generate the output S3 the function will be 
EXOR_7_4. Thus for the least significant bit the set size (k) is 1, for the second bit the 
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set size is 2 and for the most significant bit the set size is 4. Clearly the logic required 
for the more significant bits becomes more complex and thus slower to implement. 

Thus in accordance with a second embodiment of the present invention, the most 
significant output bit is generated using a symmetric function using OR logic. 

This is more practical since OR_n_k functions are faster than EXOR_n_k functions. For 
the most significant ou^ut bit 

Sk=OR^n^2*(Xi,X2..-.X«). 
In particular, with a seven-bit input 

S2=ORJ.4(Xi, X2, X3. X4. X5. Xe. X7). 

Thus in this second embodiment of the present invention the most significant bit is 
generated using symmetric functions using OR logic whereas the other bits are 
generated using symmetric functions vAnch use exclusive OR logic. 

A third embodiment will now be described in which intermediate bits are generated 
using symmetric functions using OR logic. 

An arbitrary output bit can be expressed using OR_nJc functions if one knows bits that 
are more significant For instance, the second most significant bit is given by 

S,.i = (S, A OR^n j'+2^') V ((-^S,) a 0R.n^2'*^). 
In particular, with a seven-bit input 

S, = (S2 A OK J J(Ku X2. X3, X4, X5. X6, X7)) V 
((-1S2) A 0R_7 J(X,, X2, X3, X4, X5, X6, X7)). 
A further reduction is 

Si « ORJJ(Xu X2, X3, X4, X5, X6. X7) V 
((-.S2) A 0R,7.2(X,. X2. X3. X4. X5. X6, X7)), 

A multiplexer MU, shown in figure 3, implements this logic. It has two inputs Xo, X^ a 

control C, and an output Z determined by the formula 

Z = (CaX|)v((-iC)aXo). 
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II is not practical to use either EXOR_n_k fiinctions or OR_n_k functions exclusively. It 
is optimal to use OR_n_k functions for a few most significant bits and EXOR_nJc 
fiinctions for the remaining bits. The fastest, in TSMC.25, parallel counter with 7 inputs 
is shown in figure 10. 

Future technologies that have fast 0RJS_8 blocks would allow bmlding a parallel 
counter with 1 S inputs. A fonnula for the third significant bit using OR_n_m fimctions 
is thus: 

Si.2 = (S, A S,.iA OR.n_2^+2'''+2*'^) v (S, a (-nSt.i) a OR_n J'+2^'^) v 
((-.St) A A 0R^nJ**'+2''^) V ((-.SO a (-nS,.,) a 0R^n_2''^). 

A fourth embodiment of the present invention will now be described which divides the 
logic block implementing the symmetric fimction into small blocks which can be 
reused. 

An implementation of OR_7_2 is shown m figure 1 1 . The 7 inputs are split into two 
groups: five inputs from X} to X5 and two remaining inputs X6 and X7. Then the 
following identity is a basis for the implementation in figure 1 1 . 

OR_7_2(Xi X7) = 0RJ^2(Xi, X5) v 

(0RJ_1(X|, . Xs) A 0RJ^1(X6, X7)) V ORJJ(X^ X7) 
One can write similar formulas for OR_7_4 and OR_7_6. Indeed* 
0RJ^4(Xi,... , X7) = OR,5_4(X|, X5) v 
(OR.5_3(Xi, ... X5) A OR.2^1(X6. X7)) v 
(ORJ J(Xb ...,X5) A OR.2J(X6, X7)). 

0RJJ(X,,...,X7) = 
iORJJOiu ... X5) A ORJJ(X^, X7)) V 
(0RJ^4(Xu ... X5) A OR.2_2(X6, X7)). 
Thus, it is advantageous to split variables and reuse smaller OR_n_k fimctions in a 
parallel counter* For instance, an implonentation of a parallel counter based on 
partitioning seven iiq)uts into groups of two and five is in figure 12. 
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Similarly, one can partition seven inputs into groups of four and three. An 
implementation of the parallel counter based on this partition is in figure 13. One uses 
the foUoAving logic formulas in this implementation. 

ORJJKXu ...» X7) = ORJJ(Xi, X2, X3, X4) V 
(0R^4^1(X,, X2, X3, X4) A 0RX1(X5. X6, X7)) V ORJJKXs. X7), 
ORJJCKu X7) = OR_4_4(X,, X2, X3, X4) v 
mjJOiu X2, X3, X4) A 0RJ^1(X5, X6, X7)) V 
(OR_4_2(X,, X2, X3, X4) A ORJJOiu X6. X7)) V 
(0R_4^1(Xu Xj. X3, X4) A 0RJ_3(Xs Xs. X7)), 

ORJJ(Ki X7)« 

(OR J JQ^u X2, Xj, X4) A 0RJ_2PC5. X6. X7)) v 
(OR^4.3(X,, X2, Xj, X4) A OR J JpCs, Xe, Xt))- 

One needs a method to choose between the implementations in figures 12 and 13. Here 
is a pneumonic rule for making a choice. If one or two inputs arrive essentially later then 
one should use the implementation on figure 12 based on partition 7=5+2. Otherwise, 
the implementation on figure 13 based on partition 7»4+3 is probably optimal. 

Parallel counters with 6» S, and 4 inputs can be implemented according to the logic for 
the seven input parallel counter. Reducing the number of inputs decreases the area 
sigm'ficantly and increases the speed slightly. It is advantageous to implement a dx 
input parallel counter using partitions of 6, 3 + 3 or 4 + 2. 

A second aspect of the present invention comprises a technique for multiplication and 
this will be described hereinafter. 

Multiplication is a fundamental operation in digital circuits. Given two n-digit binaiy 
numbers 

Aii.i2"-^+A„.22""^+...+A|2+Ao and B„.|2""'+B„.22'*"^+...+B,2+Bo, 
their product 

P2«-l2^'+P2„.22^^+...+P,2+Po 
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has up to 2n digits. Logica! circuits generating all Pi as outputs generally follow the 
scheme in figure 14. Wallace has invented the first fast architecture for a multiplier, now 
called the Wallace-tree multiplier (Wallace, C. S., A Suggestion for a Fast Multiplier^ 
IEEE Trans. Electron. Comput EC-13: 14-17 (1964)). Dadda has investigated bit 
behaviour in a multiplier (L. Dadda, Some Schemes for Parallel Multipliers, Alta Freq 
34: 349-356 (1 965)). He has constructed a variety of multipliers and most multipliers 
follow Dadda's scheme. 

Dadda*s multiplier uses the scheme in on figure 14. If inputs have 8 bits then 64 parallel 
AND gates generate an array shown in figure IS. The AND gate sign a is omitted for 
clarity so that At aBj becomes AiBj* The rest of figure IS illustrates array reduction that 
mvolves Ml adders (FA) and half adders (HA). Bits fiom the same column are added by 
half adders or full adders. Some groups of bits fed into a fiill adder are in rectangles. 
Some groups of bits fed into a half adder are in ovals. The resuh of array reduction is 
just two binary numbers to be added at the last step. One adds these two numbers by one 
of fast addition schemes, for instance, conditional adder or cany-look-abead adder. 

This aspect of the present invention comprises two preferred steps: array deformation 
and array reduction using the parallel counter with the accordance with the first aspect 
of the present invention. 

The process of array deformation will now be described. 

Some parts of the multiplication array, formed by AjBj such as on figure IS, have 
interesting properties. One can write simple formulas for the sum of the bits in these 
parts. Examples of such q)ecial parts are on figure 1 6. In general, chose an integer k, 
and those AiBj in the array such that the absolute value of i-j-k is less or equal to 1 
comprise a special part. 

Let Sj be the bits of the sum of all the bits of the form AjBj shown on figure 1 . Then 

So Ac A Bo» 
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S| = (Aj A Bd) ® (Ao A B|), 
S2 = (Ai A Bi) e (Ar A B| A Ao A Bo), 
S2k*i (Ak+i A Bit) ® (Ak A Bk+i) © (Ak A Bk A Ak.i a Bh) 
for all k>0, 

S2k (Ak A Bk) © (Ak-I A Bk.i A 

((Ak+i A Bk^^i) V (Ak.| A Bk-i A (Ak+i V Bk+i))) 
foraIlfc>l- 

These fonnulas show that the logic for summing the chosen entries in the array does not 
get large. Whereas if random numbers were sununed the logic for the (n + 1)^ bit is 
larger than the logic for the n^ bit 

Using these formulas, one can generate a different array. The sh^ of array changes. 
This is why it is called array deformation. These fonnulas are important because one can 
speed up a multiplication circuit by generating an airay of a particular shape. 

The anay in figure 17 is for an 8-bit multiplication. The AND gate sign a is omitted for 
clarity so that A} ABj becomes AiBj. Array deformation logic generates X Y, and Z: 

X = (A,aB6)®(AoaB7), 

Y«Ai aB7A-i(AoaB6), 
Z = Ai A B7 A Ac A Be. 
The advantage of this array over one in figure IS is that the maxima] number of bits in a 
column is smaller. The array in figure IS has a column with 8 bits. The array on figure 
17 has 4 columns with 7 bits but none with 8 or more bits. The logic for the generation 
of X Y and Z is illustrated in figure 1 8. This logic can be used in parallel with the first 
two full adders (illustrated in Figure 2) in the array reduction step thus avoiding delays 
caused by additional logic. 

Array reduction is illustrated in figure 17. The fu-st step utilizes 1 half adder, 3 full 
adders, 1 parallel counter with 4 inputs, 2 parallel counters with 5 inputs, 1 parallel 
coimter with 6 inputs, and 4 parallel counters with 7 inputs. The three parallel counters 
(in columns 7, 8, and 9) have an implementation based on 7-S+2 partition. The bits X, 



4, 9 
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Y, and Z join the group of two in the partition. The counter in column 6 is implemented 
on 7=4+3 partition. The counter in column 5 is based on 6="3+3 partition. The remaining 
counters should not be partitioned. The locations of lull adders are indicated by ovals. 
The half adder is shown by a rectangle. 

An adder for adding the fmal two binary numbers is designed based on arrival time of 
bits in two numbers. This gives a slight advantage but it is based on common 
knowledge, that is conditional adder and ripple^cany adder. 

Although in this embodiment the addition of two 8 bit numbers has been illustrated, the 
invention is applicable to any N bit binary number additioa For example for 16 bit 
addition, the array reduction will reduce the middle column height from 16 to IS thus 
allowing two seven bit full adders to be used for the first layer to generate two 3 bit 
outputs and the left over input can be used with the other two 3 outputs as an input to a 
further seven input iiill adder thus allowing the addition of the 1 6 bits in only two layers. 

The second aspect of the present invention can be used with the parallel counter of the 
first aspect of the present invention to provide a fast circuit 

The fmal counter of the first aspect of the present invention has other applications, other 
than used in the multiplier of the second aspect of the present invention. It can be used 
in RSA and reduced area multipliers. Sometimes, it is practical to build just a fragment 
of the multiplier. This can happen when the array is too large, for instance in RSA 
algorithms where multiplicands may have more than more than 1000 bits. This 
fragment of a multiplier is then used repeatedly to reduce the array. In current 
implementations, it consists of a collection of full adders. One can use 7 input parallel 
counters followed by full adders instead 

A parallel coimter can also be used in circuits for enror correction codes. One can use a 
parallel counter to produce Hamming distance. This distance is useful in digital 
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communication. In particular the Hamming distance has to be computed in certain types 
of decoders, for instance, the Viterbi decoder or majority-logic decoder. 

Given two binary messages (Ai, Aj, ... An) and (Bi, B2, ..• Bn), the Hamming distance 
between them is the number of indices i between 1 and n such that Ai and Hi are 
different. This distance can be computed by a parallel counter whose n inputs are 

(AieBi,A2©B2, ...A„©Bn). 

The multiply-and-add operation is ftmdamental in digital electronics because it includes 
filtering. Given 2n binaiy numbers Xi, Kit ... Xn, Yu ¥2, Yn, the result of this 
operation is 

X,Y, + X2Y2 + ... + X„Y„. 
One can use the multiplier described to implement multiply-and-add in hardware. 
Another strategy can be to use the scheme in figure 14. AH partial products in products 
XjYj generate an array. Then one uses the parallel coimter X to reduce the anay. 

In the present invention, one can use the parallel counter whenever there is a need to add 
an array of numbers. For instance, muhiplying negative number in two<omplement 
form, one generates a different array by either Booth recording (A. D. Booth, A Signed 
Binary Multiplication Technique, Q. J. Mech. Appl. Math. 4: 236-240 (1951)) or 
another method. To obtain a product one adds this array of numbers. 



CLAIMS: 
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1 . A parallel counter comprising: 

a plurality of inputs for receiving a binary number as a plurality of binary inputs; 

a plurality of outputs for outputting binary code indicating the number of binary 
ones in the plurality of binary inputs; and 

a logic circuit connected between the plurality of iiq>uts and the plurality of 
binary outputs and for generating each of the plurality of binary outputs as a synunetric 
function of the binary inputs. 

2. A parallel counter according to claim 1 ^min said logic circuit is arranged to 
generate at least one of the binary outputs as a symmetric function of the binary inputs 
using exclusive OR logic for combining a plurality of sets of one or more binary inputs. 

3. A parallel counter according to claim 2 wherein said logic circuit is arranged to 
logically AND members of each set of binary inputs and to logically exclusively OR the 
result of the AND operations. 

4. A parallel counter according to claim 3 wherein said logic circuit is arranged to 
logically AND 2' of the binary inputs in each set for the generation of the i'* binary 
output, where i is an integer from 1 to N» N is the number of binary outputs and i 
represents the significance of each binary output, each set being unique and the sets 
covering all possible combinations of binaiy inputs. 

5. A parallel counter according to claim 3 herein said logic circuit is arranged to 
logically AND members of each set of binary inputs, where each set is unique and the 
sets cover all possible combinations of binary inputs. 

6. A parallel counter according to any preceding claim wherein said logic circuit is 
arranged to generate at least one of the binaiy outputs as a symmetric function of the 
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binary inputs using OR logic for combining a plurality of sets of one or more binary 
inputs. 

7. A parallel counter according to claim 6 wherein said logic circuit is arranged to 
logically AND members of each set of binary inputs and to logically OR the resuh of the 
AND operations. 

8. A parallel counter according to claim 7 wherein said logic circuit is arranged to 
logically AND 2^"* of the binary inputs in each set for the generation of the N* binary 
output, where N is the number of binary outputs and the N^ binary output is the most 
significant, each set being unique and the sets covering all possible combinations of 
binary inputs. 

9. A parallel counter according to claim 7 wherein said logic circuit is arranged to 
logically AND members of each set of binaiy inputs, ^ere each set is unique and the 
sets cover all possible combinations of binary inputs. 

10. A parallel counter according to claim 1 wherein said logic circuit is ammged to 
generate a first binary output as a symmetric function of the binary inputs using 
exclusive OR logic for combining a pluraliQ^ of sets of one or more binary inputs, and to 
generate an N^ binaiy output as a symmetric function of the binary inputs using OR 
logic for combining a plurality of sets of one or more binaiy inputs. 

IK A parallel counter according to any preceding claim wherein said logic circuit is 
arranged to generate two possible binaiy outputs for a binaiy output less significant than 
the N*^ binary output, as symmetric functions of the binaiy inputs using OR logic for 
combining a plurality of sets of one or more binary inputs ^ere N is the number of 
binaiy ou^uts, the sets used for each possible binaiy output being of two different sizes 
>^ich are a function of the binaiy output being generated; and said logic circuit 
including selector logic to select one of the possible binary outputs based on a more 
significant binary output value. 
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12. A parallel counter according to claim 1 1 wherein said logic circuit is arranged to 
generate the two possible binary outputs for the (N-1)* binary output less significant 
than the N* binary output, as synunetric functions of the binary inputs using OR logic 
for combining a plurality of sets of one or more binary inputs, the sets used for each 
possible binary output being of size 2^'* + 2^'^ and 2^"^ respectively and said selector 
logic being arranged to select one of the possible binaiy outputs based on the binaiy 
output value. 

13. A parallel counter according to any preceding claim wherein said logic circuit 
includes a plurality of subcircuit logic modules each generating intermediate binary 
outputs as a symmetric function of some of the binary inputs, and logic for logically 
combining the intermediate binary outputs to generate said binary outputs. 

14. A parallel counter according to claim 13 wherein said subcircuit logic modules 
are arranged to use OR logic for combining sets of said some of said binary inputs. 

15. A parallel counter according to claim 14 wherein said logic circuit includes one 
or more logic modules each for generating a binary output as a symmetric function of 
the binary inputs using executive OR logic for combining a plurality of sets of one or 
more binary iiq>uts. 

16 A logic circuit for multiplying two N bit binary numbers, the logic circuit 
comprising: 

array generation logic for performing the logical AND operation between each 
bit in one binary number and each bit in the other binary number to generate an array of 
logical AND combination comprising an array of binary values, and for further logically 
combining values to generate the array in which the maximal depth of the array is below 
Nbits; 

array reduction logic for reducing the depth of the amy to two binaiy numbers; 

and 
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addition logic for adding the binary values of the two binary numbers. 

17. A logic circuit according to claim 16 wherein said array generation logic is 
arranged to perform the further logical combination of values for values formed by the 
logical AND combination of each bit Ai of one binary number and each bit Bj of the 
other binary number, where i«j-k <1, k is a chosen integer, and i and j are integers from 
ItoN. 

18. A logic circuit according to claim 16 or claim 17 wherein said array generation 
logic is arranged to logically AND combine each bit Aj of the first binary number with 
each bit Bj of a second binary number to generate said array comprising a sequence of 
binary numbers represented by said logical AND combinations, Ai AND Bj and to cany 
out further logical combination by logically combining the combination Ai AND Bn-2i 
A| AND Bn-1 where N is the number of bits in the binary numbers. 

19. A logic circuit according to claim 18 wherein said array generation logic is 
arranged to combine the combinations A| AND Bn.2 and A© AND Bn.|, using exclusive 
OR logic to replace these combinations, and to combine Ai AND Bn-i and Ao AND Bn.2 
to replace the Ai AND Bn.i combination. 

20. A logic circuit according to any one of claims 16 to 19 wherein said array 
reduction logic includes at least one of: at least one full adder, at least one half adder, 
and at least one parallel counter. 

21. A logic circuit according to claim 20 wherein said array reduction logic includes 
at least one parallel counter according to any one of claims 1 to IS. 
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